Fix: gpu detection when multiple GPUs exist by JannikSt · Pull Request #275 · PrimeIntellect-ai/protocol

JannikSt · 2025-04-18T10:13:03Z

No description provided.

Copilot

Pull Request Overview

This PR refactors GPU detection and handling to support systems with multiple GPUs by replacing a boolean GPU flag with a more detailed GPU specifications structure. Key changes include:

Replacing the boolean has_gpu flag with an Option throughout the Docker service and CLI layers.
Updating the DockerManager to use the GPU specifications for device requests.
Modifying GPU detection logic to collect and select multiple GPU devices.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
worker/src/docker/service.rs	Switched from a boolean flag to an Option for GPU usage.
worker/src/docker/docker_manager.rs	Updated the start_container parameter and host configuration to use GPU specs.
worker/src/cli/command.rs	Adjusted CLI command construction to reflect the new GPU model.
worker/src/checks/hardware/hardware_check.rs	Changed GPU collection to select the main GPU from multiple GPUs.
worker/src/checks/hardware/gpu.rs	Redesigned GPU detection to aggregate multiple GPU devices.
shared/src/models/node.rs	Added an optional indices field to the GpuSpecs model.
discovery/src/api/routes/node.rs	Updated test fixtures to include GPU indices.

Comments suppressed due to low confidence (2)

worker/src/docker/docker_manager.rs:193

[nitpick] Consider renaming the variable 'gpu' to 'gpu_specs' for clarity, as it represents detailed GPU specifications rather than a simple boolean flag.

let host_config = if gpu.is_some() {

worker/src/docker/docker_manager.rs:200

[nitpick] Instead of using unwrap, consider using pattern matching (e.g., if let Some(gpu_specs) = gpu) to safely access 'indices'. This avoids any potential panic and makes the control flow clearer if 'indices' is unexpectedly None.

device_ids: gpu.unwrap().indices.map(|indices| indices.iter().map(|i| i.to_string()).collect()),

mattdf · 2025-04-18T12:33:30Z

+                let name = device.name().unwrap_or_else(|_| "Unknown".to_string());
+                let memory = device.memory_info().map(|m| m.total).unwrap_or(0);
+                let driver_version = nvml
+                    .sys_driver_version()
+                    .unwrap_or_else(|_| "Unknown".to_string());

-            GpuDevice::Available {
-                name,
-                memory,
-                driver_version,
-                device_count,
+                if let Some(existing_device) = device_map.get_mut(&name) {
+                    existing_device.count += 1;
+                    existing_device.indices.push(i as u32);
+                } else {
+                    device_map.insert(
+                        name.clone(),
+                        GpuDevice {
+                            name,
+                            memory,
+                            driver_version,
+                            count: 1,
+                            indices: vec![i as u32],
+                        },
+                    );
+                }


I looked up a few SMI examples and it seems like the "name" field is usually unique to a product, but also this doesn't seem to be guaranteed, e.g. with VRAM modded cards (that might get more popular in the future) it could cause errors not to check the VRAM available. It might be worthwhile appending the memory as a string to the name and using that as a key.

*use GPU type with highest count * mount specific GPU idx and not all

JannikSt added 4 commits April 18, 2025 12:12

adjust error to user_error

f23531e

use gpu with highest count

bb37cde

detect actual gpu indices

7f27313

cleanup and fix tests

c7c63a2

JannikSt requested a review from Copilot April 18, 2025 10:21

Copilot AI reviewed Apr 18, 2025

View reviewed changes

JannikSt added 4 commits April 18, 2025 12:27

avoid unwrap

17968dc

keep driver empty

6d61c17

fix device request

d36815f

fix docker device selection

1fb280e

mattdf approved these changes Apr 18, 2025

View reviewed changes

JannikSt merged commit 99725ee into develop Apr 18, 2025
1 check passed

JannikSt added a commit that referenced this pull request Apr 21, 2025

Fix: gpu detection when multiple GPUs exist (#275)

86f13cb

*use GPU type with highest count * mount specific GPU idx and not all

JannikSt added a commit that referenced this pull request Apr 21, 2025

Fix: gpu detection when multiple GPUs exist (#275)

ac8371e

*use GPU type with highest count * mount specific GPU idx and not all

JannikSt deleted the fix/gpu-detection branch June 25, 2025 13:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: gpu detection when multiple GPUs exist#275

Fix: gpu detection when multiple GPUs exist#275
JannikSt merged 8 commits into
developfrom
fix/gpu-detection

JannikSt commented Apr 18, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

mattdf Apr 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JannikSt commented Apr 18, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

mattdf Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants